Detecting Meaningful Compounds in Complex Class Labels

نویسندگان

  • Heiner Stuckenschmidt
  • Simone Paolo Ponzetto
  • Christian Meilicke
چکیده

Real-world ontologies such as, for instance, those for the medical domain often represent highly specific, fine-grained concepts using complex labels that consist of a sequence of sublabels. In this paper, we investigate the problem of automatically detecting meaningful compounds in such complex class labels to support methods that require an automatic understanding of their meaning such as, for example, ontology matching, ontology learning and semantic search. We formulate compound identification as a supervised learning task and investigate a variety of heterogeneous features, including statistical (i.e., knowledgelean) as well as knowledge-based, for the task at hand. Our classifiers are trained and evaluated using a manually annotated dataset consisting of about 300 complex labels taken from real-world ontologies, which we designed to provide a benchmarking gold standard for this task. Experimental results show that by using a combination of distributional and knowledge-based features we are able to reach an accuracy of more than 90% for compounds of length one and almost 80% for compounds of length two. Finally, we evaluate our method in an extrinsic experimental setting: this consists of a use case highlighting the benefits of using automatically identified compounds for the high-end semantic task of ontology matching.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Learning low-dimensional representations via the usage of multiple-class labels

Learning to recognize visual objects from examples requires the ability to find meaningful patterns in spaces of very high dimensionality. We present a method for dimensionality reduction which effectively biases the learning system by combining multiple constraints via the use of class labels. The use of extensive class labels steers the resulting lowdimensional representation to become invari...

متن کامل

BCI Competition IV – Data Set I: Learning Discriminative Patterns for Self-Paced EEG-Based Motor Imagery Detection

Detecting motor imagery activities versus non-control in brain signals is the basis of self-paced brain-computer interfaces (BCIs), but also poses a considerable challenge to signal processing due to the complex and non-stationary characteristics of motor imagery as well as non-control. This paper presents a self-paced BCI based on a robust learning mechanism that extracts and selects spatio-sp...

متن کامل

Bagging for One-Class Learning

Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one particular class is well-represented but you have no prior knowledge on how many outliers it contains. This scenario can arise in a variety of real world applications such as detecting intrusions in a network or spotting malignant tumors in medical images. Although con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016